• Home
  • Education
  • Projects
  • Miscellaneous
  • Contact

Emotion AI¶

Nutshell¶

In this project I build a program that classifies emotions from images of human faces, as explained on the course Modern Artificial Intelligence, lectured by Dr. Ryan Ahmed, Ph.D. MBA.

The data set I use is from https://www.kaggle.com/c/facial-keypoints-detection/overview and consists of over 20000 facial images that have been labeled with facial expression/emotion and approximately 2000 images with their keypoint annotations.

The program will train two models which will detect

  1. facial keypoints
  2. detect emotions.

Then these models are combined into one model that will provide the keypoints and the emotion as the output.

A short recap of artificial neuronal networks¶

Artificial neurons are built in a similar way as human neurons. The artificial neurons take in signals through input channels (dendrites in human neurons) and processes information through transfer functions (cell bodies) and generates an output (which would travel through the axon of a neuronal cell).

No description has been provided for this image
No description has been provided for this image

Fig. 1. Side by side view of artificial and biological neurons. Credit: Top image from Introduction to Psychology (A critical approach) Copyright © 2021 by Rose M. Spielman; Kathryn Dumper; William Jenkins; Arlene Lacombe; Marilyn Lovett; and Marion Perlmutter licensed under a Creative Commons Attribution 4.0 International License. Bottom image Chrislb, CC BY-SA 3.0 , via Wikimedia Commons

For example lets consider an artificial neuron (AN) that takes three inputs: $x_1$, $x_2$, and $x_3$. We can then express the output of the artificial neuron mathematically as $y = \phi(X_1W_1 + X_2W_2 + X_3W_3 + b)$. Here $y$ is the output and the $W$s are the weights assigned to each input signal. $b$ is a bias term added to the weighted sum of inputs. $\phi$ is the activation function.

Some common modern activation functions used in neural networks are for example ReLU, GELU and the logistic activation function. ReLU is short for Rectified linear unit function and is defined as $\phi(x) = max(0,\alpha + x'b)$. ReLU is recommended for the hidden layers, since it outputs a linear response for positive values. This helps maintain larger gradients and makes training deep networks more feasible.

The Gaussian Error Linear Unit (GELU) is a smoother version of the ReLU and is defined as $x\phi(x)$, where the $\phi(x)$ stands for Gaussian cumulative distribution function.

The logistic activation function is also called sigmoid function and is defined as $\phi(x) = \frac{1}{1+e^{-x}}$. It takes a number and sets it between 0 and 1 and thus is very helpful in output layers.

No description has been provided for this image

Training¶

All neural networks need to be trained with labeled data. The available data is generally devided to 80% training and 20% testing data. It is also recommended to further divide the training data into an actual training data set (e.g. 60%) and a validation data set (e.g. 20%).

Training is done by adjusting the weights of the network, by iteratively minimising the cost function using for example the gradient descent optimization algorithm. It works by calculating the gradient of the cost function and then takes a step to the negative direction until it reaches the local or global minimum.

A typical choice for a cost function is the quadratic loss, which is formulated as $f_{loss}(w,b)= \frac{1}{N}\sum^n_{i=1}(\hat y-y)$.

Gradient descent algorithm:

1. Calculate the derivative of the loss function $\frac{\delta f_{loss}}{\delta w}$

2. Pick random values for weights and substitute.

3. Calculate the step size, i.e. how much we will update our weights.

step size = learning rate * gradient $=\alpha*\frac{\delta f_{loss}}{\delta w}$

4. Update the parameters and repeat.

new weight = old weight - step size $w_{new}=w_{old}-\alpha*\frac{\delta f_{loss}}{\delta w}$

Below is an example for searching the minimum of a u-shaped funciton with gradient descent. Usually the situation is mulidimensional but the simplification is solved in a similar way.

No description has been provided for this image

Testing various learning rates helps undestand the importance of choosing the parameters of training.

No description has been provided for this image

As shown above too large learning rate can lead to missig the global minimum and/or the model does not converge as quickly. Equally problematic can be too small learning rates when the model does not learn. To solve the problems rising from too small or too large learning rates there are several approaches to adjust the learning rates dynamically.

Momentum is analogous to the balls tendency to keep rolling down hill. Momentum is used to speed up the learning when the error cost gradient is heading in the same direction for a long time, and slow down when a leveled area is reached. Momentum is controlled by a variable that is analogous to the mass of the ball rolling. A large momentum helps avoiding getting stuck in local minima, but might also push through the minima we wish to find. Thus, the parameter has to be selected carefully.

Learning rates can also be adjusted through decay, which basically reduces the learning rate by a certain amount after a fixed number of epochs. It can help solve above like situations, where too great learning rate makes the learning jump back and forth over a minimum.

Adagrad or Adam are examples of popular adaptive algorithms for optimising the gradient descent.

Network architectures¶

The artificial neurons are connected to each other to form neural networks and a plethora of different network architectures exist. To harness the power of AI, it is necessary to know which architecture serves the intended purpose best. Below are three common architectures and their applications.

Recurrent Neural Networks (RNNs) handle sequential data by maintaining a hidden state that captures information about previous elements in the sequence. Therefore they are great for contexts where the output depends on previous inputs, for example time series and natural language processing.

Generative Adversial Networks (GANs) consist of two neural networks - the Generator and the Discriminator. They sparr each other in a zero-sum game framework, where the genrator creates synthetic data that resembles real data and the discriminator evaluates whether it is rela or not. This dirves the generator to output increasingly realistic data. Obviously, this is the choice for many image generation and editing but also for anomaly detection in industiral and security contexts. GANs can model regular patterns and subsequently detect anomalies by comparing generated outputs with real inputs.

Convolutional Neural Networks (CNN) are designed to process data with a grid-like topology and are most commonly used in image analysis. They utilise convolutional layers to learn spatial hierarchies by applying filters (kernels) that slide (convolve) over the input. They usually involve pooling layers that reduce the spatial dimensions and fully connected layers that map the extracted features to outputs.

No description has been provided for this image

Fig. 2. Convolutional neural network. Credit: Aphex34, CC BY-SA 4.0, via Wikimedia Commons

In the Emotion AI, I will use the Residual network (ResNet), which is a Residual Neural Network. Resnet's architecture includes "skip connection" features which enables training very deep networks wihtout vanishing gradient issues. Vanishing gradient problems occurs when the gradient is back-propagated to earlier layers and the resulting gradient is very small.The skip connection feature works by passing the input of one layer to a layer further down in the network. This is also called identity mapping. The ResNet model that I use has been pretrained with the ImagNet dataset.

No description has been provided for this image

Fig. 3. Identity mapping. Credit: LunarLullaby, CC BY-SA 4.0, via Wikimedia Commons

Part 1. Key facial points detection¶

In this section I program the DL model with convolutional neural network and residual blocks to predict facial keypoints. The data set is from https://www.kaggle.com/c/facial-keypoints-detection/overview.

The dataset consists of input images with 15 facial key points each. The training.csv file has 7049 face images with corresponding keypoint locations. The test.csv file has face images only, and will be used to test the model. The images are strings of numbers in the shape of (2140,). That has to be transformed into the real shape of the images (96, 96). Thus we create a 1-D array of the string and reshape it to 2D array.

The model I build will have the architecture presented below. The Resblock consists of two different type of blocks: Convolution block and identity block. As seen below, both of them have an additioinal short path to add the original input to the output. For the Covolution block this includes few extra steps to shape the input to the same dimensions as the output from the longer path.

Final model architecture Resblock architecture

Sanity check for the data by visualising 64 randomly chosen images along with their key facial points.

Output hidden; open in https://colab.research.google.com to view.

Image augmentation¶

Here I create an additional data set where the images are changed slightly to improve the generalisation of the final AI model. The idea is to get more data and more variability in e.g. orientation, lighting conditions, or size of the image. This will reduce the likelihood of overfitting and ensuring that the model learns the meaningful "concepts" of emotion recognition. I create 4 types of augmented images:

  1. horisontal flipping
  2. randomly increasing brightness
  3. vertical flipping
  4. rotation with random angle
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
(8560, 31)

Data normalization and scaling¶

Normalizing the image pixel values to range 0 - 1:

# Split the data into train and test data
X_train_kp, X_test_kp, y_train_kp, y_test_kp = train_test_split(img_array, img_target, test_size=0.2, random_state=42)
(6848, 96, 96, 1)
(1712, 96, 96, 1)
(1712, 30)
(6848, 30)

Building the Residual Neural Network model for key facial points detection¶

Kernels are used to modify the input by sweeping it over the original input as shown in this animation:

2D Convolution Animation

Fig. 4 Performing a convolution on 6x6 input with a 3x3 kernel using stride 1x1. Credit: Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons.

For example, a 2D convolution command:

X = Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), kernel_initializer = glorot_uniform(seed=0))(X_input)

The above function defines the following:

  • use 64 distinct filters (each one is a trainable 7×7 “weight grid”).
  • use stride 2x2, i.e., the filter jumps 2 pixels at a time, effectively “skipping” every other location.
  • intialise the kernels with glorot_uniform method, aka Xavier uniform initialization. This draws samples from a uniform distribution within a specific range, which will be determined from the number of input and output units.

The section below defines the model architecture using Keras.

# @title Resblock

def res_block(X, filter, stage):
  """
  Implementation of the Resblock.

  Arguments:
  X -- input tensor
  filters -- tuple/list of integers, the number of filters for each conv layer (f1, f2, f3)
  stage -- integer, used to name the layers
  block -- string, used to name the layers uniquely within a stage

  Returns:
  X -- output of the res block
  """
  ### 1: Convolutional block###
  # Make a copy of the input
  X_shortcut = X

  f1, f2, f3 = filter

  # ----Long (main) path-----
  # Conv2d
  X = Conv2D(f1, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_a', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  # MaxPool2D
  X = MaxPool2D(pool_size=(2,2))(X)
  # BatchNorm,ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_a')(X)
  X = Activation('relu')(X)

  # Conv2D (kernel 3x3)
  X = Conv2D(f2, kernel_size = (3,3), strides = (1,1), padding = 'same', name=str(stage)+'convblock'+'_conv_b', \
            kernel_initializer = glorot_uniform(seed=0))(X)
  # BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_b')(X)
  X = Activation('relu')(X)

  #Conv2D
  X = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_c', \
             kernel_initializer = glorot_uniform(seed=0))(X)
  #BatchNorm, ReLU
  X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_c')(X)


  # ----Short path----

  # Conv2D
  X_shortcut = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_short', \
                      kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
  # MaxPool2D and Batchnorm
  X_shortcut = MaxPool2D(pool_size=(2,2))(X_shortcut)
  X_shortcut = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_short')(X_shortcut)


  # ----Add Paths together----
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 2: Identity block 1 ###
  # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden1'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  ### 3: Identity block 2 ###
   # Save the input value (shortcut path)
  X_shortcut = X
  block = 'iden2'
  # First component: Conv2D -> BatchNorm -> ReLU
  X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
  X = Activation('relu')(X)

  # Second component: Conv2D (3x3) -> BatchNorm -> ReLU
  X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
  X = Activation('relu')(X)

  # Third component: Conv2D (1x1) -> BatchNorm
  X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
             kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)

  # Add shortcut value to the main path
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)

  return X

Next build the final model.

# @title Final Resnet Neural Network model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(filters = 64, kernel_size = (7,7), strides = (2,2), name='conv1', \
           kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter =  [64, 64, 256], stage = 'res1')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res2')

# We could also add more resblocks if we want
# X = res_block(X, filter= [256,256,1024], stage= 'res3')

# Average pooling
X = AveragePooling2D((2,2), name = 'avg_pool')(X)

# Flatten
X = Flatten()(X)

# Dense, ReLU, Dropout
X = Dense(4096, activation = 'relu')(X)
X = Dropout(0.2)(X)
X = Dense(2048, activation = 'relu')(X)
X = Dropout(0.1)(X)
X = Dense(30, activation = 'relu')(X)

model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)

Explanations of components¶

The Zeropadding adds a border of zeros (3 pixels wide) around the input image. This will prevent information loss at the edges of convolutions.

Conv2D is the cake of the convolutional layer. It applies the filters to the input image and slides them with a set stride. This way the features are extracted from the image.

The BatchNormalisation layer normalizes the output of the convolution, making training more stable. We can say it is the smooth cream layer on our convolution cake.

The ReLU activation function introduces non-linearity to the model.

MaxpPooling2D reduces the spatial dimensions of the feature maps by taking the maximum value in a window and so downsamples the output. After the Resblock, AveragePooling2D is used similar to MaxPooling, except it calculates the average value within the window. It also reduces the size of the feature maps. Just to give an impression of the impact of pooling, if we removed the MaxPooling 2D layers from Resblocks the final model would have 256 million parameters - instead of 18 million.

Flatten converts the multi-dimensional feature maps into a single, long vector, preparing the data for the fully connected layers.

Dense creates a fully connected layer where each neuron is connected to every neuron in the previous layer. These fully connected layers will process the features exrtacted by the convolutional layers.

Dropout layers are a regularisation technique which drops a set percentage of the neurons during training by setting them to zero. This makes the model less likely to overfit, and decreases the interdependency between the neurons. Therefore we improve the performance of the network and the generalisability of the model.

The final model has a very complex structure, 18 million trainable parameters, which allows it to learn to identify emotions as good or even better than average human. However, too many parameters can lead to problems, such as overfitting and slow or nonconverging training. Optimising this many parameters is not a trivial task.

Compiling and training the model¶

I will use the Adam optimization method for the training. Adam is a computationally efficient stochastic gradient method and it combines the gradient descent with momentum and the RMSP algorithm.

As discussed earlier, the momentum speeds the training by accelerating the gradients by adding a fraction of the previous gradient to the current one. The RMSP or Root Mean Square Propagation is an adaptive learning algorithm that takes the 'exponential moving average' of the gradients. In other words, it adapts the learning rate for each parameter by keeping track of an exponentially decaying average of past squared gradients.

The algortihm will proceed as follows:

1. Calculate the gradient $g_t$

$g_t = \frac{\delta L }{\delta w_t}$

2. Update the Biased first moment estimate $m_t$

$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$

This is similar to calculating the momentum as we keep track of the decaying average of past gradients.

3. Update the Biased Second Moment Estimate $v_t$

$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$

This is similar to RMSP as we keep track of an exponentially decaying average of past squared gradients.

4. Bias correction for $m_t$ and $v_t$

Especially at the beginning of training, $m_t$ and $v_t$ are biased toward zero (because the y are initialised at zero). This is corrected by Adam like this:

$\hat m = \frac{m_t}{1-\beta_1^t}$, $\hat v = \frac{v_t}{1-\beta_2^t}$

5. Parameter update

$w_{t} = w_{t_1} - \alpha_t\frac{\hat m_t}{(v_t+\epsilon)^{1/2}}*g_t$

where,

$g_t$ = gradient of the loss with respect to the parameters at iteration $t$

$\alpha_t$ = learning rate at iteration $t$

$\beta_1, \beta_2$ = decay rates for the moment estimates

$\epsilon$ = small constant to prevent division by zero

The tensorflow tool for Adam optimization accepts several arguments as input:

learning_rate: can be a float or a scheduler that optimizes the learning rate

beta_1 = A value or constant tensor (float) that tells the exponential decay rate for the 1st moment estimates, i.e. the means of the gradients. Default = 0.9.

beta_2 = A value or constant tensor (float) that tells the exponential decay rate for the 2nd moment estimates, i.e. the uncentered variance of the squared gradients. Default 0.999.

amsgrad = True/False. Wether the AMSGrad variant of the algorithm presented in the paper On the Convergence of Adam and beyond shall be applied. Default = False.

weight_decay = If set the weight decay will be set.

Other things to consider when optimising¶

The batch size determines how many training examples are processed before the model's internal parameters are updated. Smaller batch sizes can speed up the training per epoch because the model updates more frequently. However, this can lead to less stable convergence, i.e. the training loss may fluctuate more. A small batch size can be beneficial in case the model is overfitting (the trianing loss is significantly lower than the validation loss).

A larger batch size leads to slower training per epoch and requires moe memory, but can yield more stable updates for the parameters. The model usually converges more smoothly, but might not generalise as well due to "sharp minima".

Another way to tune the parameters of optimization is to use learning rate schedulers. Why? As training progresses, the model gets closer to a good solution. Smaller learning rates allow for finer adjustments to the model's weights, helping it converge to a better minimum without overshooting (see the gradient descent examples in the beginning). I have implemented a learning rate algorithm that reduces the learning rate if the validation loss does not improve in 5 epochs.

After training, the model is saved in a .keras file. The .keras is a zip archive that contains:

  • The architecture
  • The weights
  • The optimizer's status
# @title Compiling and training with 3 epochs
run_example = True
if run_example:
  adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
                                  beta_2 = 0.999, amsgrad = False)
  model_3_facialKeyPoints = Model(inputs = X_input, outputs = X)
  model_3_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
                                  metrics = ['accuracy'])

  #Save the best model with least validation loss here
  checkpoint  = ModelCheckpoint(filepath = "Models/FacialKeyPoints_model_16-12-2025.keras", \
                                verbose = 1, save_best_only = True)

  history3 = model_3_facialKeyPoints.fit(X_train_kp, y_train_kp, batch_size = 32, \
                    epochs = 3, validation_split = 0.05, callbacks=[checkpoint])
Epoch 1/3
204/204 ━━━━━━━━━━━━━━━━━━━━ 0s 76ms/step - accuracy: 0.5066 - loss: 511.2633
Epoch 1: val_loss improved from inf to 694.32581, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
204/204 ━━━━━━━━━━━━━━━━━━━━ 54s 128ms/step - accuracy: 0.5069 - loss: 509.7457 - val_accuracy: 0.5627 - val_loss: 694.3258
Epoch 2/3
202/204 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6501 - loss: 27.1448
Epoch 2: val_loss improved from 694.32581 to 131.69623, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
204/204 ━━━━━━━━━━━━━━━━━━━━ 4s 19ms/step - accuracy: 0.6502 - loss: 27.1052 - val_accuracy: 0.5860 - val_loss: 131.6962
Epoch 3/3
204/204 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6906 - loss: 18.1163
Epoch 3: val_loss improved from 131.69623 to 27.90438, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
204/204 ━━━━━━━━━━━━━━━━━━━━ 4s 18ms/step - accuracy: 0.6906 - loss: 18.1128 - val_accuracy: 0.7697 - val_loss: 27.9044
Epoch 1/100
102/102 ━━━━━━━━━━━━━━━━━━━━ 0s 148ms/step - accuracy: 0.5404 - loss: 895.0652
Epoch 1: val_loss improved from inf to 84.35088, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 49s 223ms/step - accuracy: 0.5409 - loss: 889.3439 - val_accuracy: 0.6822 - val_loss: 84.3509 - learning_rate: 8.0000e-04
Epoch 2/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.6552 - loss: 24.0586
Epoch 2: val_loss improved from 84.35088 to 26.11082, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.6558 - loss: 24.0017 - val_accuracy: 0.7784 - val_loss: 26.1108 - learning_rate: 8.0000e-04
Epoch 3/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.6976 - loss: 17.8676
Epoch 3: val_loss improved from 26.11082 to 10.68048, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.6978 - loss: 17.8524 - val_accuracy: 0.8047 - val_loss: 10.6805 - learning_rate: 8.0000e-04
Epoch 4/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7243 - loss: 14.8862
Epoch 4: val_loss improved from 10.68048 to 9.54454, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7245 - loss: 14.8785 - val_accuracy: 0.8338 - val_loss: 9.5445 - learning_rate: 8.0000e-04
Epoch 5/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7542 - loss: 13.2277
Epoch 5: val_loss did not improve from 9.54454
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7543 - loss: 13.2243 - val_accuracy: 0.8280 - val_loss: 10.5135 - learning_rate: 8.0000e-04
Epoch 6/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7533 - loss: 12.6656
Epoch 6: val_loss did not improve from 9.54454
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7535 - loss: 12.6735 - val_accuracy: 0.8163 - val_loss: 9.8618 - learning_rate: 8.0000e-04
Epoch 7/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7619 - loss: 11.5468
Epoch 7: val_loss did not improve from 9.54454
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7620 - loss: 11.5456 - val_accuracy: 0.8455 - val_loss: 11.2517 - learning_rate: 8.0000e-04
Epoch 8/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7806 - loss: 10.9133
Epoch 8: val_loss did not improve from 9.54454
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7803 - loss: 10.9111 - val_accuracy: 0.8280 - val_loss: 14.5672 - learning_rate: 8.0000e-04
Epoch 9/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7756 - loss: 10.6394
Epoch 9: val_loss improved from 9.54454 to 7.74191, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7757 - loss: 10.6384 - val_accuracy: 0.8367 - val_loss: 7.7419 - learning_rate: 8.0000e-04
Epoch 10/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7893 - loss: 10.3817
Epoch 10: val_loss did not improve from 7.74191
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7893 - loss: 10.3806 - val_accuracy: 0.8513 - val_loss: 11.8877 - learning_rate: 8.0000e-04
Epoch 11/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7864 - loss: 10.5088
Epoch 11: val_loss improved from 7.74191 to 7.10119, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 4s 34ms/step - accuracy: 0.7863 - loss: 10.5075 - val_accuracy: 0.8280 - val_loss: 7.1012 - learning_rate: 8.0000e-04
Epoch 12/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7827 - loss: 10.1306
Epoch 12: val_loss improved from 7.10119 to 6.29367, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7828 - loss: 10.1247 - val_accuracy: 0.8367 - val_loss: 6.2937 - learning_rate: 8.0000e-04
Epoch 13/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7932 - loss: 10.0304
Epoch 13: val_loss did not improve from 6.29367
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7934 - loss: 10.0245 - val_accuracy: 0.8688 - val_loss: 6.3245 - learning_rate: 8.0000e-04
Epoch 14/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8002 - loss: 8.6506
Epoch 14: val_loss improved from 6.29367 to 5.73820, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8001 - loss: 8.6591 - val_accuracy: 0.8630 - val_loss: 5.7382 - learning_rate: 8.0000e-04
Epoch 15/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7880 - loss: 9.1423
Epoch 15: val_loss improved from 5.73820 to 5.19746, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7883 - loss: 9.1266 - val_accuracy: 0.8367 - val_loss: 5.1975 - learning_rate: 8.0000e-04
Epoch 16/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8040 - loss: 7.7693
Epoch 16: val_loss did not improve from 5.19746
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8039 - loss: 7.7827 - val_accuracy: 0.8571 - val_loss: 7.8125 - learning_rate: 8.0000e-04
Epoch 17/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7935 - loss: 8.1466
Epoch 17: val_loss did not improve from 5.19746
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7937 - loss: 8.1353 - val_accuracy: 0.8601 - val_loss: 5.4262 - learning_rate: 8.0000e-04
Epoch 18/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8138 - loss: 8.5209
Epoch 18: val_loss improved from 5.19746 to 4.91526, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8137 - loss: 8.5156 - val_accuracy: 0.8659 - val_loss: 4.9153 - learning_rate: 8.0000e-04
Epoch 19/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8156 - loss: 8.0416
Epoch 19: val_loss did not improve from 4.91526
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8156 - loss: 8.0426 - val_accuracy: 0.8776 - val_loss: 5.1973 - learning_rate: 8.0000e-04
Epoch 20/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8213 - loss: 7.6001
Epoch 20: val_loss did not improve from 4.91526
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8212 - loss: 7.6102 - val_accuracy: 0.8513 - val_loss: 5.6242 - learning_rate: 8.0000e-04
Epoch 21/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8136 - loss: 8.3246
Epoch 21: val_loss improved from 4.91526 to 4.54217, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8135 - loss: 8.3226 - val_accuracy: 0.8746 - val_loss: 4.5422 - learning_rate: 8.0000e-04
Epoch 22/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8227 - loss: 7.1771
Epoch 22: val_loss improved from 4.54217 to 4.39473, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8225 - loss: 7.1885 - val_accuracy: 0.8455 - val_loss: 4.3947 - learning_rate: 8.0000e-04
Epoch 23/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8195 - loss: 7.8114
Epoch 23: val_loss did not improve from 4.39473
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8195 - loss: 7.8081 - val_accuracy: 0.8834 - val_loss: 5.3614 - learning_rate: 8.0000e-04
Epoch 24/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8143 - loss: 7.7779
Epoch 24: val_loss did not improve from 4.39473
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8144 - loss: 7.7859 - val_accuracy: 0.8776 - val_loss: 5.1739 - learning_rate: 8.0000e-04
Epoch 25/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8107 - loss: 8.0027
Epoch 25: val_loss did not improve from 4.39473
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8109 - loss: 7.9943 - val_accuracy: 0.8192 - val_loss: 4.9128 - learning_rate: 8.0000e-04
Epoch 26/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8258 - loss: 6.7683
Epoch 26: val_loss did not improve from 4.39473
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8259 - loss: 6.7793 - val_accuracy: 0.8309 - val_loss: 6.3827 - learning_rate: 8.0000e-04
Epoch 27/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8278 - loss: 7.5027
Epoch 27: val_loss did not improve from 4.39473

Epoch 27: ReduceLROnPlateau reducing learning rate to 0.0005199999868636951.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8278 - loss: 7.5068 - val_accuracy: 0.8601 - val_loss: 7.5741 - learning_rate: 8.0000e-04
Epoch 28/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8374 - loss: 6.5003
Epoch 28: val_loss improved from 4.39473 to 3.24508, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8373 - loss: 6.4874 - val_accuracy: 0.8659 - val_loss: 3.2451 - learning_rate: 5.2000e-04
Epoch 29/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8393 - loss: 6.6087
Epoch 29: val_loss did not improve from 3.24508
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8392 - loss: 6.6148 - val_accuracy: 0.8776 - val_loss: 5.8127 - learning_rate: 5.2000e-04
Epoch 30/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8369 - loss: 6.9225
Epoch 30: val_loss did not improve from 3.24508
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8369 - loss: 6.9109 - val_accuracy: 0.8921 - val_loss: 4.1179 - learning_rate: 5.2000e-04
Epoch 31/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8423 - loss: 5.7378
Epoch 31: val_loss did not improve from 3.24508
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8421 - loss: 5.7366 - val_accuracy: 0.8863 - val_loss: 4.2921 - learning_rate: 5.2000e-04
Epoch 32/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8381 - loss: 6.5616
Epoch 32: val_loss did not improve from 3.24508
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8382 - loss: 6.5517 - val_accuracy: 0.8892 - val_loss: 4.1760 - learning_rate: 5.2000e-04
Epoch 33/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8470 - loss: 5.9248
Epoch 33: val_loss improved from 3.24508 to 3.10968, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 4s 42ms/step - accuracy: 0.8468 - loss: 5.9220 - val_accuracy: 0.8921 - val_loss: 3.1097 - learning_rate: 5.2000e-04
Epoch 34/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8420 - loss: 5.5085
Epoch 34: val_loss did not improve from 3.10968
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8421 - loss: 5.5138 - val_accuracy: 0.8863 - val_loss: 4.7243 - learning_rate: 5.2000e-04
Epoch 35/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8544 - loss: 5.4292
Epoch 35: val_loss did not improve from 3.10968
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8541 - loss: 5.4260 - val_accuracy: 0.8776 - val_loss: 3.7656 - learning_rate: 5.2000e-04
Epoch 36/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8415 - loss: 5.6864
Epoch 36: val_loss did not improve from 3.10968
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8416 - loss: 5.6903 - val_accuracy: 0.8659 - val_loss: 5.2461 - learning_rate: 5.2000e-04
Epoch 37/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8418 - loss: 5.8241
Epoch 37: val_loss did not improve from 3.10968
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8420 - loss: 5.8182 - val_accuracy: 0.8805 - val_loss: 3.8141 - learning_rate: 5.2000e-04
Epoch 38/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8442 - loss: 5.3520
Epoch 38: val_loss did not improve from 3.10968

Epoch 38: ReduceLROnPlateau reducing learning rate to 0.0003380000009201467.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8442 - loss: 5.3551 - val_accuracy: 0.8542 - val_loss: 4.5850 - learning_rate: 5.2000e-04
Epoch 39/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8590 - loss: 5.0501
Epoch 39: val_loss improved from 3.10968 to 2.96368, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 5s 48ms/step - accuracy: 0.8588 - loss: 5.0505 - val_accuracy: 0.8717 - val_loss: 2.9637 - learning_rate: 3.3800e-04
Epoch 40/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8466 - loss: 4.9849
Epoch 40: val_loss did not improve from 2.96368
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8467 - loss: 4.9852 - val_accuracy: 0.8950 - val_loss: 2.9935 - learning_rate: 3.3800e-04
Epoch 41/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8538 - loss: 4.8843
Epoch 41: val_loss did not improve from 2.96368
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8539 - loss: 4.8809 - val_accuracy: 0.8863 - val_loss: 3.8110 - learning_rate: 3.3800e-04
Epoch 42/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8453 - loss: 4.7436
Epoch 42: val_loss did not improve from 2.96368
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8455 - loss: 4.7438 - val_accuracy: 0.8834 - val_loss: 2.9947 - learning_rate: 3.3800e-04
Epoch 43/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8619 - loss: 4.7259
Epoch 43: val_loss did not improve from 2.96368
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8619 - loss: 4.7253 - val_accuracy: 0.8805 - val_loss: 3.4255 - learning_rate: 3.3800e-04
Epoch 44/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8565 - loss: 5.2307
Epoch 44: val_loss did not improve from 2.96368

Epoch 44: ReduceLROnPlateau reducing learning rate to 0.00021970000816509127.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8566 - loss: 5.2346 - val_accuracy: 0.8688 - val_loss: 3.3162 - learning_rate: 3.3800e-04
Epoch 45/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8636 - loss: 4.5413
Epoch 45: val_loss improved from 2.96368 to 2.77632, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 5s 52ms/step - accuracy: 0.8635 - loss: 4.5399 - val_accuracy: 0.8746 - val_loss: 2.7763 - learning_rate: 2.1970e-04
Epoch 46/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8630 - loss: 4.5546
Epoch 46: val_loss did not improve from 2.77632
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8631 - loss: 4.5525 - val_accuracy: 0.8863 - val_loss: 2.8350 - learning_rate: 2.1970e-04
Epoch 47/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8631 - loss: 4.5181
Epoch 47: val_loss did not improve from 2.77632
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8631 - loss: 4.5253 - val_accuracy: 0.9067 - val_loss: 2.8909 - learning_rate: 2.1970e-04
Epoch 48/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8623 - loss: 4.8690
Epoch 48: val_loss did not improve from 2.77632
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8623 - loss: 4.8653 - val_accuracy: 0.9009 - val_loss: 3.1521 - learning_rate: 2.1970e-04
Epoch 49/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8673 - loss: 4.3649
Epoch 49: val_loss did not improve from 2.77632
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8673 - loss: 4.3639 - val_accuracy: 0.8921 - val_loss: 3.0881 - learning_rate: 2.1970e-04
Epoch 50/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8590 - loss: 4.5642
Epoch 50: val_loss did not improve from 2.77632

Epoch 50: ReduceLROnPlateau reducing learning rate to 0.0001428050090908073.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8591 - loss: 4.5627 - val_accuracy: 0.8863 - val_loss: 3.1290 - learning_rate: 2.1970e-04
Epoch 51/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8687 - loss: 4.3097
Epoch 51: val_loss improved from 2.77632 to 2.51120, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8687 - loss: 4.3091 - val_accuracy: 0.8776 - val_loss: 2.5112 - learning_rate: 1.4281e-04
Epoch 52/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8692 - loss: 4.0908
Epoch 52: val_loss did not improve from 2.51120
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8692 - loss: 4.0924 - val_accuracy: 0.8805 - val_loss: 2.8092 - learning_rate: 1.4281e-04
Epoch 53/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8739 - loss: 4.2070
Epoch 53: val_loss improved from 2.51120 to 2.47897, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8738 - loss: 4.2098 - val_accuracy: 0.8950 - val_loss: 2.4790 - learning_rate: 1.4281e-04
Epoch 54/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8753 - loss: 3.9069
Epoch 54: val_loss did not improve from 2.47897
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8753 - loss: 3.9092 - val_accuracy: 0.9067 - val_loss: 2.6902 - learning_rate: 1.4281e-04
Epoch 55/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8733 - loss: 4.2418
Epoch 55: val_loss did not improve from 2.47897
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8733 - loss: 4.2433 - val_accuracy: 0.8950 - val_loss: 2.9139 - learning_rate: 1.4281e-04
Epoch 56/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8633 - loss: 4.0361
Epoch 56: val_loss did not improve from 2.47897
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8634 - loss: 4.0389 - val_accuracy: 0.8776 - val_loss: 2.8956 - learning_rate: 1.4281e-04
Epoch 57/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8686 - loss: 4.2986
Epoch 57: val_loss did not improve from 2.47897
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8686 - loss: 4.2972 - val_accuracy: 0.8863 - val_loss: 3.9869 - learning_rate: 1.4281e-04
Epoch 58/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8711 - loss: 4.1020
Epoch 58: val_loss did not improve from 2.47897

Epoch 58: ReduceLROnPlateau reducing learning rate to 9.282326063839719e-05.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8710 - loss: 4.0995 - val_accuracy: 0.9038 - val_loss: 2.5309 - learning_rate: 1.4281e-04
Epoch 59/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8656 - loss: 4.0259
Epoch 59: val_loss did not improve from 2.47897
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8657 - loss: 4.0240 - val_accuracy: 0.8980 - val_loss: 2.5523 - learning_rate: 9.2823e-05
Epoch 60/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8812 - loss: 3.8122
Epoch 60: val_loss improved from 2.47897 to 2.46816, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8810 - loss: 3.8140 - val_accuracy: 0.8980 - val_loss: 2.4682 - learning_rate: 9.2823e-05
Epoch 61/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8706 - loss: 3.8289
Epoch 61: val_loss improved from 2.46816 to 2.39652, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 34ms/step - accuracy: 0.8707 - loss: 3.8319 - val_accuracy: 0.9067 - val_loss: 2.3965 - learning_rate: 9.2823e-05
Epoch 62/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8731 - loss: 3.9521
Epoch 62: val_loss did not improve from 2.39652
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8730 - loss: 3.9500 - val_accuracy: 0.8950 - val_loss: 2.5407 - learning_rate: 9.2823e-05
Epoch 63/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8758 - loss: 3.9776
Epoch 63: val_loss improved from 2.39652 to 2.35033, saving model to Models/FacialKeyPoints_model_16-12-2025.keras
102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8758 - loss: 3.9766 - val_accuracy: 0.8892 - val_loss: 2.3503 - learning_rate: 9.2823e-05
Epoch 64/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8694 - loss: 3.9023
Epoch 64: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8694 - loss: 3.9044 - val_accuracy: 0.8863 - val_loss: 2.4499 - learning_rate: 9.2823e-05
Epoch 65/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8763 - loss: 4.0924
Epoch 65: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8763 - loss: 4.0896 - val_accuracy: 0.9096 - val_loss: 2.5051 - learning_rate: 9.2823e-05
Epoch 66/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8790 - loss: 3.7645
Epoch 66: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8791 - loss: 3.7633 - val_accuracy: 0.8980 - val_loss: 2.4585 - learning_rate: 9.2823e-05
Epoch 67/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8746 - loss: 3.8432
Epoch 67: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8747 - loss: 3.8459 - val_accuracy: 0.8892 - val_loss: 2.5714 - learning_rate: 9.2823e-05
Epoch 68/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8815 - loss: 4.0808
Epoch 68: val_loss did not improve from 2.35033

Epoch 68: ReduceLROnPlateau reducing learning rate to 6.033512036083267e-05.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8814 - loss: 4.0770 - val_accuracy: 0.9009 - val_loss: 2.6651 - learning_rate: 9.2823e-05
Epoch 69/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8842 - loss: 3.6016
Epoch 69: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8841 - loss: 3.6034 - val_accuracy: 0.8921 - val_loss: 2.4923 - learning_rate: 6.0335e-05
Epoch 70/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8763 - loss: 3.6564
Epoch 70: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8763 - loss: 3.6570 - val_accuracy: 0.9038 - val_loss: 2.4128 - learning_rate: 6.0335e-05
Epoch 71/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8832 - loss: 3.6022
Epoch 71: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8832 - loss: 3.6028 - val_accuracy: 0.8805 - val_loss: 2.4937 - learning_rate: 6.0335e-05
Epoch 72/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8820 - loss: 3.6392
Epoch 72: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8819 - loss: 3.6396 - val_accuracy: 0.9038 - val_loss: 2.7778 - learning_rate: 6.0335e-05
Epoch 73/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8769 - loss: 3.6124
Epoch 73: val_loss did not improve from 2.35033

Epoch 73: ReduceLROnPlateau reducing learning rate to 3.921782918041572e-05.
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8768 - loss: 3.6138 - val_accuracy: 0.8921 - val_loss: 2.4349 - learning_rate: 6.0335e-05
Epoch 74/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8788 - loss: 3.5607
Epoch 74: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8787 - loss: 3.5615 - val_accuracy: 0.8834 - val_loss: 2.4719 - learning_rate: 3.9218e-05
Epoch 75/100
100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8758 - loss: 3.5777
Epoch 75: val_loss did not improve from 2.35033
102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8759 - loss: 3.5763 - val_accuracy: 0.8921 - val_loss: 2.6243 - learning_rate: 3.9218e-05
Epoch 75: early stopping
Restoring model weights from the end of the best epoch: 63.
No description has been provided for this image

Assessing the trained key facial points detection model performance¶

# Evaluate the model
# The model from materials has loss: 8.3705 accuracy: 0.85280377 with the X_test,y_test set.

result = model_1_facialKeyPoints.evaluate(X_test_kp, y_test_kp)
54/54 ━━━━━━━━━━━━━━━━━━━━ 11s 78ms/step - accuracy: 0.8771 - loss: 2.5321
54/54 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step
# @title Printing out samples of predictions
fig, axes = plt.subplots(4,4, figsize=(10,10))
axes = axes.ravel()

out_path = "docs/pics/kp_train_pred_grid.png"

for i in range(16):
  axes[i].imshow(X_test_kp[i].reshape(96,96), cmap='gray')
  axes[i].axis('off')
  for j in range(1,31,2):
      axes[i].plot(predicted_kp.iloc[i,j-1],predicted_kp.iloc[i,j], marker='.', color=kp_color)


fig.tight_layout()
fig.savefig(out_path, dpi=200, bbox_inches="tight", transparent=True)
plt.close(fig)

display(Image(filename=out_path,width=600))
No description has been provided for this image

Part 2. Facial Expression detection¶

In this second part of the project, I train the second model which will classify emotions. The data contains images that belong to 5 categories:

  • 0 = Angry
  • 1 = Disgust
  • 2 = Sad
  • 3 = Happy
  • 4 = Surprise

The images in the data set are of size 48px * 48px. Therefore they need to be resized so that we can run the Expression detection model with the Key facial point detection model together.

Below is an example of an original image, results from resizing and final image after interpolation.

No description has been provided for this image

Visualising the images in the dataset with the emotions¶

No description has been provided for this image

Below is the counts of each emotion category. Our data is extremely unbalanced with very few images portraying disgust and many images within category happy.

No description has been provided for this image

Data preparation and image augmentation¶

X shape (24568, 96, 96, 1)
y shape (24568, 5)
X train shape (22111, 96, 96, 1)
y train shape (22111, 5)
X val shape (1228, 96, 96, 1)
y val shape (1228, 5)
X test shape (1229, 96, 96, 1)
y test shape (1229, 5)

Data preprocessing¶

In the data preprocessing I will again normalize the data and perform image augmentation, as was done in the Part 1. of the project.

First, I normalize the data to conatin values between 0 and 1. Then, I use the following image augmentation techniques:

  1. rotating up to 15 degrees
  2. shifting the image horisontally up to 0.1*image width
  3. shifting the image vertically up to 0.1*image height
  4. shearing the image up to 0.1
  5. zooming the image up to 10 %
  6. horisontally flipping the image
  7. vertically flipping the image
  8. Adjusting the brightness

The spaces outside the boundaries are filled by replicting the nearest pixels.

Build and train Deep Learning model for facial expression classification¶

The model I will build has the following architecture:

%3 cluster_final_model Emotion Detection model input INPUT zeropad Zero padding input->zeropad conv2d Conv2D zeropad->conv2d bn_relu BatchNorm, ReLU conv2d->bn_relu pool MaxPool2D bn_relu->pool Res1 Res-block pool->Res1 Res2 Res-block Res1->Res2 Avgpool AveragePooling2D Res2->Avgpool flatten Flatten() Avgpool->flatten dense1 Dense, ReLU, Dropout flatten->dense1 output OUTPUT dense1->output
# @title Emotion recognition model

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage 1
X = Conv2D(64, (7,7), strides = (2,2), name = 'conv1', kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)

# Stage 2
X = res_block(X, filter = [64,64,256], stage = 'res2')

# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res3')

# Stage 4 (optional)
#X = res_block(X, filter= [256,256,1024], stage = 'res4')

# Average pooling
X = AveragePooling2D((4,4), name = 'avg_pool')(X)

# Final layer
X = Flatten()(X)
X  = Dense(5, activation = 'softmax', name = 'dense', kernel_initializer=glorot_uniform(seed=0))(X)

Emotion_det_model_2 = Model(inputs = X_input, outputs = X, name = 'Resnet18')
Epoch 1/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step - accuracy: 0.3861 - loss: 1.4138
Epoch 1: val_loss improved from inf to 1.41906, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 60s 111ms/step - accuracy: 0.3861 - loss: 1.4138 - val_accuracy: 0.3510 - val_loss: 1.4191 - learning_rate: 1.0000e-04
Epoch 2/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4018 - loss: 1.3722
Epoch 2: val_loss did not improve from 1.41906
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4018 - loss: 1.3722 - val_accuracy: 0.3420 - val_loss: 1.5051 - learning_rate: 1.0000e-04
Epoch 3/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4280 - loss: 1.3311
Epoch 3: val_loss did not improve from 1.41906
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4280 - loss: 1.3311 - val_accuracy: 0.3428 - val_loss: 1.4884 - learning_rate: 1.0000e-04
Epoch 4/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4427 - loss: 1.3050
Epoch 4: val_loss did not improve from 1.41906
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4427 - loss: 1.3050 - val_accuracy: 0.4560 - val_loss: 1.4508 - learning_rate: 1.0000e-04
Epoch 5/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4537 - loss: 1.2874
Epoch 5: val_loss did not improve from 1.41906
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4537 - loss: 1.2873 - val_accuracy: 0.3648 - val_loss: 1.5678 - learning_rate: 1.0000e-04
Epoch 6/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4715 - loss: 1.2507
Epoch 6: val_loss improved from 1.41906 to 1.27781, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4715 - loss: 1.2507 - val_accuracy: 0.4951 - val_loss: 1.2778 - learning_rate: 1.0000e-04
Epoch 7/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4722 - loss: 1.2431
Epoch 7: val_loss did not improve from 1.27781
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.4723 - loss: 1.2431 - val_accuracy: 0.4707 - val_loss: 1.3927 - learning_rate: 1.0000e-04
Epoch 8/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4861 - loss: 1.2247
Epoch 8: val_loss did not improve from 1.27781
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4861 - loss: 1.2247 - val_accuracy: 0.4145 - val_loss: 1.3872 - learning_rate: 1.0000e-04
Epoch 9/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4894 - loss: 1.2070
Epoch 9: val_loss improved from 1.27781 to 1.07033, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 75ms/step - accuracy: 0.4894 - loss: 1.2070 - val_accuracy: 0.5871 - val_loss: 1.0703 - learning_rate: 1.0000e-04
Epoch 10/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5040 - loss: 1.1821
Epoch 10: val_loss improved from 1.07033 to 0.97317, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5040 - loss: 1.1821 - val_accuracy: 0.6091 - val_loss: 0.9732 - learning_rate: 1.0000e-04
Epoch 11/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5164 - loss: 1.1603
Epoch 11: val_loss did not improve from 0.97317
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5164 - loss: 1.1603 - val_accuracy: 0.5904 - val_loss: 1.0113 - learning_rate: 1.0000e-04
Epoch 12/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5150 - loss: 1.1665
Epoch 12: val_loss did not improve from 0.97317
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5150 - loss: 1.1665 - val_accuracy: 0.5904 - val_loss: 1.0509 - learning_rate: 1.0000e-04
Epoch 13/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5153 - loss: 1.1494
Epoch 13: val_loss improved from 0.97317 to 0.94859, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5153 - loss: 1.1494 - val_accuracy: 0.6336 - val_loss: 0.9486 - learning_rate: 1.0000e-04
Epoch 14/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5239 - loss: 1.1388
Epoch 14: val_loss did not improve from 0.94859
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 71ms/step - accuracy: 0.5239 - loss: 1.1388 - val_accuracy: 0.5513 - val_loss: 1.0847 - learning_rate: 1.0000e-04
Epoch 15/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5191 - loss: 1.1377
Epoch 15: val_loss did not improve from 0.94859
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5192 - loss: 1.1377 - val_accuracy: 0.6344 - val_loss: 0.9714 - learning_rate: 1.0000e-04
Epoch 16/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5317 - loss: 1.1307
Epoch 16: val_loss did not improve from 0.94859
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5317 - loss: 1.1307 - val_accuracy: 0.5912 - val_loss: 1.0424 - learning_rate: 1.0000e-04
Epoch 17/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5284 - loss: 1.1242
Epoch 17: val_loss did not improve from 0.94859
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5284 - loss: 1.1242 - val_accuracy: 0.6067 - val_loss: 1.0344 - learning_rate: 1.0000e-04
Epoch 18/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5414 - loss: 1.1065
Epoch 18: val_loss improved from 0.94859 to 0.89885, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5414 - loss: 1.1065 - val_accuracy: 0.6466 - val_loss: 0.8989 - learning_rate: 1.0000e-04
Epoch 19/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5354 - loss: 1.1080
Epoch 19: val_loss did not improve from 0.89885
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5354 - loss: 1.1081 - val_accuracy: 0.5415 - val_loss: 1.1652 - learning_rate: 1.0000e-04
Epoch 20/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5408 - loss: 1.0957
Epoch 20: val_loss did not improve from 0.89885
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5408 - loss: 1.0958 - val_accuracy: 0.6523 - val_loss: 0.9121 - learning_rate: 1.0000e-04
Epoch 21/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5369 - loss: 1.0992
Epoch 21: val_loss did not improve from 0.89885
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5369 - loss: 1.0992 - val_accuracy: 0.6230 - val_loss: 0.9248 - learning_rate: 1.0000e-04
Epoch 22/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5462 - loss: 1.0941
Epoch 22: val_loss improved from 0.89885 to 0.83682, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5462 - loss: 1.0941 - val_accuracy: 0.6775 - val_loss: 0.8368 - learning_rate: 1.0000e-04
Epoch 23/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5493 - loss: 1.0809
Epoch 23: val_loss did not improve from 0.83682
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5493 - loss: 1.0809 - val_accuracy: 0.6311 - val_loss: 1.0064 - learning_rate: 1.0000e-04
Epoch 24/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5507 - loss: 1.0862
Epoch 24: val_loss did not improve from 0.83682
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5507 - loss: 1.0862 - val_accuracy: 0.6588 - val_loss: 0.8818 - learning_rate: 1.0000e-04
Epoch 25/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5484 - loss: 1.0819
Epoch 25: val_loss did not improve from 0.83682
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5484 - loss: 1.0819 - val_accuracy: 0.6686 - val_loss: 0.8533 - learning_rate: 1.0000e-04
Epoch 26/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5452 - loss: 1.0779
Epoch 26: val_loss did not improve from 0.83682
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5452 - loss: 1.0779 - val_accuracy: 0.6564 - val_loss: 0.9244 - learning_rate: 1.0000e-04
Epoch 27/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5593 - loss: 1.0614
Epoch 27: val_loss did not improve from 0.83682

Epoch 27: ReduceLROnPlateau reducing learning rate to 6.499999835796189e-05.
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5593 - loss: 1.0614 - val_accuracy: 0.6034 - val_loss: 1.0909 - learning_rate: 1.0000e-04
Epoch 28/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5602 - loss: 1.0567
Epoch 28: val_loss did not improve from 0.83682
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5602 - loss: 1.0567 - val_accuracy: 0.6531 - val_loss: 0.8728 - learning_rate: 6.5000e-05
Epoch 29/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5572 - loss: 1.0452
Epoch 29: val_loss improved from 0.83682 to 0.79482, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5572 - loss: 1.0452 - val_accuracy: 0.7052 - val_loss: 0.7948 - learning_rate: 6.5000e-05
Epoch 30/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5607 - loss: 1.0446
Epoch 30: val_loss did not improve from 0.79482
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5607 - loss: 1.0446 - val_accuracy: 0.6450 - val_loss: 0.9357 - learning_rate: 6.5000e-05
Epoch 31/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5652 - loss: 1.0389
Epoch 31: val_loss improved from 0.79482 to 0.74947, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5652 - loss: 1.0389 - val_accuracy: 0.7166 - val_loss: 0.7495 - learning_rate: 6.5000e-05
Epoch 32/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5692 - loss: 1.0364
Epoch 32: val_loss did not improve from 0.74947
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5692 - loss: 1.0364 - val_accuracy: 0.7044 - val_loss: 0.7778 - learning_rate: 6.5000e-05
Epoch 33/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5628 - loss: 1.0403
Epoch 33: val_loss did not improve from 0.74947
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5628 - loss: 1.0403 - val_accuracy: 0.6694 - val_loss: 0.8482 - learning_rate: 6.5000e-05
Epoch 34/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5724 - loss: 1.0341
Epoch 34: val_loss did not improve from 0.74947
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5724 - loss: 1.0342 - val_accuracy: 0.7191 - val_loss: 0.7568 - learning_rate: 6.5000e-05
Epoch 35/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5729 - loss: 1.0349
Epoch 35: val_loss did not improve from 0.74947
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5729 - loss: 1.0349 - val_accuracy: 0.6987 - val_loss: 0.7701 - learning_rate: 6.5000e-05
Epoch 36/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5737 - loss: 1.0287
Epoch 36: val_loss improved from 0.74947 to 0.73751, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5737 - loss: 1.0287 - val_accuracy: 0.7215 - val_loss: 0.7375 - learning_rate: 6.5000e-05
Epoch 37/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5734 - loss: 1.0301
Epoch 37: val_loss improved from 0.73751 to 0.72995, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5734 - loss: 1.0301 - val_accuracy: 0.7337 - val_loss: 0.7299 - learning_rate: 6.5000e-05
Epoch 38/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5726 - loss: 1.0256
Epoch 38: val_loss did not improve from 0.72995
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5726 - loss: 1.0256 - val_accuracy: 0.7305 - val_loss: 0.7316 - learning_rate: 6.5000e-05
Epoch 39/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5706 - loss: 1.0234
Epoch 39: val_loss did not improve from 0.72995
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5706 - loss: 1.0233 - val_accuracy: 0.6987 - val_loss: 0.8048 - learning_rate: 6.5000e-05
Epoch 40/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5739 - loss: 1.0210
Epoch 40: val_loss did not improve from 0.72995
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5739 - loss: 1.0210 - val_accuracy: 0.7215 - val_loss: 0.7553 - learning_rate: 6.5000e-05
Epoch 41/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5833 - loss: 1.0068
Epoch 41: val_loss did not improve from 0.72995
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 74ms/step - accuracy: 0.5833 - loss: 1.0069 - val_accuracy: 0.7101 - val_loss: 0.8168 - learning_rate: 6.5000e-05
Epoch 42/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5816 - loss: 1.0073
Epoch 42: val_loss did not improve from 0.72995

Epoch 42: ReduceLROnPlateau reducing learning rate to 4.2250000115018337e-05.
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5816 - loss: 1.0074 - val_accuracy: 0.6995 - val_loss: 0.7831 - learning_rate: 6.5000e-05
Epoch 43/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5826 - loss: 1.0035
Epoch 43: val_loss did not improve from 0.72995
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5825 - loss: 1.0035 - val_accuracy: 0.6963 - val_loss: 0.8011 - learning_rate: 4.2250e-05
Epoch 44/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5759 - loss: 1.0042
Epoch 44: val_loss improved from 0.72995 to 0.70902, saving model to Models/Emotion_det_model_16-12-2025.keras
346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5759 - loss: 1.0042 - val_accuracy: 0.7378 - val_loss: 0.7090 - learning_rate: 4.2250e-05
Epoch 45/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5769 - loss: 1.0099
Epoch 45: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5769 - loss: 1.0099 - val_accuracy: 0.7280 - val_loss: 0.7477 - learning_rate: 4.2250e-05
Epoch 46/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5754 - loss: 1.0030
Epoch 46: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5754 - loss: 1.0030 - val_accuracy: 0.7321 - val_loss: 0.7268 - learning_rate: 4.2250e-05
Epoch 47/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5818 - loss: 1.0055
Epoch 47: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5818 - loss: 1.0055 - val_accuracy: 0.7248 - val_loss: 0.7487 - learning_rate: 4.2250e-05
Epoch 48/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5810 - loss: 0.9977
Epoch 48: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5810 - loss: 0.9977 - val_accuracy: 0.7264 - val_loss: 0.7410 - learning_rate: 4.2250e-05
Epoch 49/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5873 - loss: 0.9896
Epoch 49: val_loss did not improve from 0.70902

Epoch 49: ReduceLROnPlateau reducing learning rate to 2.746250102063641e-05.
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5873 - loss: 0.9896 - val_accuracy: 0.7565 - val_loss: 0.7099 - learning_rate: 4.2250e-05
Epoch 50/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5859 - loss: 0.9827
Epoch 50: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5859 - loss: 0.9827 - val_accuracy: 0.7508 - val_loss: 0.7094 - learning_rate: 2.7463e-05
Epoch 51/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5882 - loss: 0.9861
Epoch 51: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5881 - loss: 0.9861 - val_accuracy: 0.7288 - val_loss: 0.7217 - learning_rate: 2.7463e-05
Epoch 52/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5864 - loss: 0.9916
Epoch 52: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5864 - loss: 0.9916 - val_accuracy: 0.7280 - val_loss: 0.7329 - learning_rate: 2.7463e-05
Epoch 53/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5932 - loss: 0.9836
Epoch 53: val_loss did not improve from 0.70902
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5931 - loss: 0.9836 - val_accuracy: 0.7435 - val_loss: 0.7125 - learning_rate: 2.7463e-05
Epoch 54/100
346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5933 - loss: 0.9751
Epoch 54: val_loss did not improve from 0.70902

Epoch 54: ReduceLROnPlateau reducing learning rate to 1.785062613635091e-05.
346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5933 - loss: 0.9751 - val_accuracy: 0.7272 - val_loss: 0.7548 - learning_rate: 2.7463e-05
Epoch 54: early stopping
Training samples: 22111
Batch size: 64
Steps per epoch: 346

Evaluate model¶

Confusion matrix, accuracy, precision, and recall

No description has been provided for this image
39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 35ms/step - accuracy: 0.7671 - loss: 0.6242
No description has been provided for this image
No description has been provided for this image
Classification report for emotion detection model:
              precision    recall  f1-score   support

           0       0.64      0.64      0.64       245
           1       0.89      0.36      0.52        22
           2       0.66      0.76      0.70       319
           3       0.87      0.81      0.84       458
           4       0.85      0.83      0.84       185

    accuracy                           0.76      1229
   macro avg       0.78      0.68      0.71      1229
weighted avg       0.77      0.76      0.76      1229

The above table tells us that the classes where we had the least data (# support) have the weakest performance. Precision (percentage of samples predicted to be class x that are actually x) and recall (percentage of x samples in data that are correctly labeled as x) are highest in class 3 where we also had the most samples. f1 -score is the harmonic mean of precision and recall and it is calculated as

$F_1 = \frac{\text{precision} \ \times \ \text{recall}}{\text{precision} \ +\ \text{recall}}$

Part 3. Combining the key point detection and facial expression recognition models¶

39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 93ms/step
39/39 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y ... nose_tip_y mouth_left_corner_x mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x mouth_center_bottom_lip_y emotion
0 64.648270 39.462273 35.029705 27.303909 57.424828 38.391609 71.964722 43.019527 41.132412 31.116213 ... 52.116562 51.100479 69.342575 22.495335 57.190903 35.552544 63.602200 34.299168 67.525764 3
1 67.049782 37.382389 30.436636 31.989122 59.316116 38.093258 74.362892 38.955746 37.369408 34.813652 ... 60.451092 57.179199 80.493698 28.356564 76.011620 42.261841 78.572998 41.904636 82.303894 2
2 64.116463 36.904625 34.684856 35.691677 57.599121 37.935898 70.917572 37.860672 40.516914 37.410923 ... 59.986973 58.181126 76.441887 33.698921 75.366005 46.023739 74.968422 45.919083 78.991425 2
3 63.984192 36.043690 27.817284 38.471413 54.756351 37.784523 74.124146 36.159767 35.542854 38.942562 ... 52.165367 71.062103 59.365967 28.763102 60.063107 46.585697 60.998100 47.175484 65.630623 3
4 63.373016 40.186924 29.947830 38.409580 56.042046 41.237423 70.940956 41.500175 37.221470 40.498299 ... 60.592480 60.439510 77.742844 29.692043 76.036491 45.099701 77.343796 45.182957 79.698212 0

5 rows × 31 columns

Plotting test images of the combined models. The models have not seen these images before.

No description has been provided for this image